Improving fuzzy matching through syntactic knowledge

نویسندگان

Tom Vanallemeersch

Vincent Vandeghinste

چکیده

Fuzzy matching in translation memories (TM) is mostly string-based in current CAT tools. These tools look for TM sentences highly similar to an input sentence, using edit distance to detect the differences between sentences. Current CAT tools use limited or no linguistic knowledge in this procedure. In the recently started SCATE project, which aims at improving translators’ efficiency, we apply syntactic fuzzy matching in order to detect abstract similarities and to increase the number of fuzzy matches. We parse TM sentences in order to create hierarchical structures identifying constituents and/or dependencies. We calculate TER (Translation Error Rate) between an existing human translation of an input sentence and the translation of its fuzzy match in TM. This allows us to assess the usefulness of syntactic matching with respect to string-based matching. First results hint at the potential of syntactic matching to lower TER rates for sentences with a low match score in a string-based setting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantics-based pretranslation for SMT using fuzzy matches

Semantic knowledge has been adopted recently for SMT preprocessing, decoding and evaluation, in order to be able to compare sentences based on their meaning rather than on mere lexical and syntactic similarity. Little attention has been paid to semantic knowledge in the context of integrating fuzzy matches from a translation memory with SMT. We present work in progress which focuses on semantic...

متن کامل

Case-based Reasoning for Diagnosis of Stress using Enhanced Cosine and Fuzzy Similarity

Intelligent analysis of heterogeneous data and information sources for efficient decision support presents an interesting yet challenging task in clinical environments. This is particularly the case in stress medicine where digital patient records are becoming popular which contain not only lengthy time series measurements but also unstructured textual documents expressed in form of natural lan...

متن کامل

Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

Due to its explicit modeling of the grammaticality of the output via target-side syntax, the string-to-tree model has been shown to be one of the most successful syntax-based translation models. However, a major limitation of this model is that it does not utilize any useful syntactic information on the source side. In this paper, we analyze the difficulties of incorporating source syntax in a ...

متن کامل

Simple but Effective Approaches to Improving Tree-to-tree Model

Tree-to-tree translation model is widely studied in statistical machine translation (SMT) and is believed to be much potential to achieve promising translation quality. However, the existing models still suffer from the unsatisfactory performance due to the limitations both in rule extraction and decoding procedure. According to our analysis and experiments, we have found that tree-to-tree mode...

متن کامل

Transformation Rules for Knowledge-Based Pattern Matching

Many AI tasks require determining whether two knowledge representations encode the same knowledge. For example, rule-based classification requires matching rule antecedents with working memory; information retrieval requires matching queries with documents; and some knowledge-acquisition tasks require matching new information with already encoded knowledge to expand upon and debug both of them....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Improving fuzzy matching through syntactic knowledge

نویسندگان

چکیده

منابع مشابه

Semantics-based pretranslation for SMT using fuzzy matches

Case-based Reasoning for Diagnosis of Stress using Enhanced Cosine and Fuzzy Similarity

Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

Simple but Effective Approaches to Improving Tree-to-tree Model

Transformation Rules for Knowledge-Based Pattern Matching

عنوان ژورنال:

اشتراک گذاری